gh-139772: Add PyDict_NewPresized() function #139773

vstinner · 2025-10-08T13:29:40Z

Issue: [C API] Add PyDict_NewPresized() function #139772

📚 Documentation preview 📚: https://cpython-previews--139773.org.readthedocs.build/

Doc/c-api/dict.rst

vstinner · 2025-10-11T21:58:03Z

I convert this PR to a draft for now since it seems like the API is misused by 3rd party projects, and I proposed PyDict_FromItems() which is a different abstraction: #139963

vstinner · 2025-10-12T12:42:01Z

I rewrote the PR to add unicode_keys parameters: PyObject* PyDict_NewPresized(Py_ssize_t size, int unicode_keys).

methane · 2025-10-13T07:48:57Z

There are two news entries.

vstinner · 2025-10-13T11:36:24Z

Benchmark on PyDict_New() vs PyDict_NewPresized() with Unicode keys:

Benchmark	new	presized
dict-10	2.69 us	2.62 us: 1.03x faster
dict-100	29.6 us	27.5 us: 1.08x faster
dict-1,000	301 us	283 us: 1.06x faster
dict-10,000	3.50 ms	3.18 ms: 1.10x faster
Geometric mean	(ref)	1.05x faster

Benchmark hidden because not significant (1): dict-1

Code:

diff --git a/Modules/_testcapimodule.c b/Modules/_testcapimodule.c
index 4e73be20e1b..a1eaed01178 100644
--- a/Modules/_testcapimodule.c
+++ b/Modules/_testcapimodule.c
@@ -2562,6 +2562,77 @@ toggle_reftrace_printer(PyObject *ob, PyObject *arg)
     Py_RETURN_NONE;
 }
 
+
+static PyObject *
+bench_dict_new(PyObject *ob, PyObject *args)
+{
+    Py_ssize_t size, loops;
+    if (!PyArg_ParseTuple(args, "nn", &size, &loops)) {
+        return NULL;
+    }
+
+    PyTime_t t1, t2;
+    PyTime_PerfCounterRaw(&t1);
+    for (Py_ssize_t loop=0; loop < loops; loop++) {
+        PyObject *d = PyDict_New();
+        if (d == NULL) {
+            return NULL;
+        }
+
+        for (Py_ssize_t i=0; i < size; i++) {
+            PyObject *key = PyUnicode_FromFormat("%zi", i);
+            assert(key != NULL);
+
+            PyObject *value = PyLong_FromLong(i);
+            assert(value != NULL);
+
+            assert(PyDict_SetItem(d, key, value) == 0);
+        }
+
+        assert(PyDict_Size(d) == size);
+        Py_DECREF(d);
+    }
+    PyTime_PerfCounterRaw(&t2);
+
+    return PyFloat_FromDouble(PyTime_AsSecondsDouble(t2 - t1));
+}
+
+
+static PyObject *
+bench_dict_presized(PyObject *ob, PyObject *args)
+{
+    Py_ssize_t size, loops;
+    if (!PyArg_ParseTuple(args, "nn", &size, &loops)) {
+        return NULL;
+    }
+
+    PyTime_t t1, t2;
+    PyTime_PerfCounterRaw(&t1);
+    for (Py_ssize_t loop=0; loop < loops; loop++) {
+        PyObject *d = PyDict_NewPresized(size, 1);
+        if (d == NULL) {
+            return NULL;
+        }
+
+        for (Py_ssize_t i=0; i < size; i++) {
+            PyObject *key = PyUnicode_FromFormat("%zi", i);
+            assert(key != NULL);
+
+            PyObject *value = PyLong_FromLong(i);
+            assert(value != NULL);
+
+            assert(PyDict_SetItem(d, key, value) == 0);
+        }
+
+        assert(PyDict_Size(d) == size);
+        Py_DECREF(d);
+    }
+    PyTime_PerfCounterRaw(&t2);
+
+    return PyFloat_FromDouble(PyTime_AsSecondsDouble(t2 - t1));
+}
+
+
 static PyMethodDef TestMethods[] = {
     {"set_errno",               set_errno,                       METH_VARARGS},
     {"test_config",             test_config,                     METH_NOARGS},
@@ -2656,6 +2727,8 @@ static PyMethodDef TestMethods[] = {
     {"test_atexit", test_atexit, METH_NOARGS},
     {"code_offset_to_line", _PyCFunction_CAST(code_offset_to_line), METH_FASTCALL},
     {"toggle_reftrace_printer", toggle_reftrace_printer, METH_O},
+    {"bench_dict_new", bench_dict_new, METH_VARARGS},
+    {"bench_dict_presized", bench_dict_presized, METH_VARARGS},
     {NULL, NULL} /* sentinel */
 };

bench_new.py:

import pyperf
import functools
import _testcapi
runner = pyperf.Runner()
for size in (1, 10, 100, 1_000, 10_000):
    func = functools.partial(_testcapi.bench_dict_new, size)
    runner.bench_time_func(f'dict-{size:,}', func)

bench_presized.py:

import pyperf
import functools
import _testcapi
runner = pyperf.Runner()
for size in (1, 10, 100, 1_000, 10_000):
    func = functools.partial(_testcapi.bench_dict_presized, size)
    runner.bench_time_func(f'dict-{size:,}', func)

vstinner · 2025-10-13T11:38:27Z

I created capi-workgroup/decisions#80 to the C API Working Group for this API.

vstinner · 2025-10-13T12:34:46Z

Benchmark on PyDict_New() vs PyDict_NewPresized() with integer keys:

Benchmark	new	presized
dict-1	294 ns	301 ns: 1.02x slower
dict-10	2.61 us	2.51 us: 1.04x faster
dict-100	26.1 us	24.8 us: 1.05x faster
dict-1,000	260 us	250 us: 1.04x faster
dict-10,000	3.07 ms	2.78 ms: 1.10x faster
Geometric mean	(ref)	1.04x faster

davidhewitt · 2025-10-16T12:53:52Z

This seems useful to me for PyO3 👍

I am unsure how reliably we will be able to use the unicode_keys hint. My feeling is that it might be the case that in cases where we're confident about the key types we would have been able to use the proposed PyDict_FromItems.

vstinner · 2025-10-16T15:02:12Z

I am unsure how reliably we will be able to use the unicode_keys hint. My feeling is that it might be the case that in cases where we're confident about the key types we would have been able to use the proposed PyDict_FromItems.

Correct.

If you know your input data, you can set the unicode_keys hint in advance, before consuming the iterator. You can use PyDict_NewPresized() in this case.

If you don't know your input data, you might need to consume the iterator and store keys and values in a temporary array, and then call PyDict_FromItems() which computes the unicode_keys hint for you.

davidhewitt · 2025-10-16T19:28:31Z

I think this seems the wrong way around for me as a user; if I don't know my input data I'd rather not collect it to a temporary array, it could be a large dataset which would be a big temporary allocation.

If I know the input data, I was thinking I would even be able to allocate the items in stack memory before calling PyDict_FromItems.

davidhewitt · 2025-10-16T19:29:32Z

Or are you saying that it is more efficient to use PyDict_NewPresized and repeated calls to PyDict_SetItem than to use PyDict_FromItems?

vstinner · 2025-10-16T21:15:11Z

Or are you saying that it is more efficient to use PyDict_NewPresized and repeated calls to PyDict_SetItem than to use PyDict_FromItems?

Oh, I don't know which function is faster. So I ran benchmarks: #139963 (comment). PyDict_FromItems() is faster than PyDict_NewPresized()+PyDict_SetItem().

vstinner requested review from AA-Turner, markshannon and methane as code owners October 8, 2025 13:29

bedevere-app bot mentioned this pull request Oct 8, 2025

[C API] Add PyDict_NewPresized() function #139772

Open

bedevere-app bot added the awaiting core review label Oct 8, 2025

methane reviewed Oct 9, 2025

View reviewed changes

Doc/c-api/dict.rst Outdated Show resolved Hide resolved

vstinner marked this pull request as draft October 11, 2025 21:57

bedevere-app bot removed the awaiting core review label Oct 11, 2025

vstinner force-pushed the dict_presized branch 3 times, most recently from eb555c6 to 8bb9715 Compare October 12, 2025 12:40

pythongh-139772: Add PyDict_NewPresized() function

8a61f5a

vstinner force-pushed the dict_presized branch from 8bb9715 to 8a61f5a Compare October 12, 2025 12:41

vstinner mentioned this pull request Oct 12, 2025

gh-139772: Add PyDict_FromItems() function #139963

Open

Fix the doc

4b34da9

methane approved these changes Oct 13, 2025

View reviewed changes

bedevere-app bot added the awaiting merge label Oct 13, 2025

vstinner marked this pull request as ready for review October 13, 2025 10:06

bedevere-app bot added awaiting core review and removed awaiting merge labels Oct 13, 2025

vstinner mentioned this pull request Oct 13, 2025

Add PyDict_NewPresized() function capi-workgroup/decisions#80

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-139772: Add PyDict_NewPresized() function #139773

gh-139772: Add PyDict_NewPresized() function #139773

vstinner commented Oct 8, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

vstinner commented Oct 11, 2025

Uh oh!

vstinner commented Oct 12, 2025

Uh oh!

methane commented Oct 13, 2025

Uh oh!

vstinner commented Oct 13, 2025 •

edited

Loading

Uh oh!

vstinner commented Oct 13, 2025

Uh oh!

vstinner commented Oct 13, 2025

Uh oh!

davidhewitt commented Oct 16, 2025

Uh oh!

vstinner commented Oct 16, 2025

Uh oh!

davidhewitt commented Oct 16, 2025

Uh oh!

davidhewitt commented Oct 16, 2025

Uh oh!

vstinner commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

gh-139772: Add PyDict_NewPresized() function #139773

Are you sure you want to change the base?

gh-139772: Add PyDict_NewPresized() function #139773

Conversation

vstinner commented Oct 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vstinner commented Oct 11, 2025

Uh oh!

vstinner commented Oct 12, 2025

Uh oh!

methane commented Oct 13, 2025

Uh oh!

vstinner commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Oct 13, 2025

Uh oh!

vstinner commented Oct 13, 2025

Uh oh!

davidhewitt commented Oct 16, 2025

Uh oh!

vstinner commented Oct 16, 2025

Uh oh!

davidhewitt commented Oct 16, 2025

Uh oh!

davidhewitt commented Oct 16, 2025

Uh oh!

vstinner commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vstinner commented Oct 8, 2025 •

edited by github-actions bot

Loading

vstinner commented Oct 13, 2025 •

edited

Loading